Enabling efficient stencil code generation in OpenACC

نویسندگان

  • Alyson D. Pereira
  • Rodrigo C. O. Rocha
  • Márcio Bastos Castro
  • Luís F. W. Góes
  • Mario A. R. Dantas
چکیده

The OpenACC programming model simplifies the programming for accelerator devices such as GPUs. Its abstract accelerator model defines a least common denominator for accelerator devices, thus it cannot represent architectural specifics of these devices without losing portability. Therefore, this general-purpose approach delivers good performance on average, but it misses optimization opportunities for code generation and execution of specific classes of applications. In this paper, we propose stencil extensions to enable efficient code generation in OpenACC. Our results show that our stencil extensions may improve the performance of OpenACC in up to 28% and 45% on GPU and CPU, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers

GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. As such, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and ma...

متن کامل

Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model

In this work we use the GPU porting task for the operative Japanese weather prediction model “ASUCA” as an opportunity to examine productivity issues with OpenACC when applied to structured grid problems. We then propose “Hybrid Fortran”, an approach that combines the advantages of directive based methods (no rewrite of existing code necessary) with that of stencil DSLs (memory layout is abstra...

متن کامل

Autotuning Tensor Contraction Computations on GPUs

We describe a framework for generating optimized GPU code for computing tensor contractions, a multidimensional generalization of matrix-matrix multiplication that arises frequently in computational science applications. Typical performance optimization strategies for such computations transform the tensors into sequences of matrix-matrix multiplications to take advantage of an optimized BLAS l...

متن کامل

Automatic Stencil Code Generation- Ph.D. Thesis Proposal

Stencil-based kernels constitute the core of many scientific applications on block-structured grids. These calculations form the basis for a wide range of scientific applications from simple Jacobi iterations to complex multigrid and block structured adaptive PDE solvers. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and ...

متن کامل

Generating Efficient Parallel Programs for Distributed Memory Systems

Leveraging the performance of distributed and shared memory clusters in scientific computing is challenging in terms of programmability and efficiency. The dimensions of the problem are data distribution, computation distribution, efficient communications and the ease of programming. To address those dimensions in a balanced manner, we present a directive-based programming model for hybrid dist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017